Karen Spärck Jones (1935-2007)
نویسنده
چکیده
AI. Although she died on 4 April 2007, she worked up until a week before her death, and her major and most lasting contributions will almost certainly be her original PhD thesis and the inverse document frequency (idf) measure of the relevance of terms. 1 The latter is the notion that a document is relevant not only because key terms are frequent in it but because those terms are infrequent in other, nonrelevant, documents. This idea is now a basic part of information retrieval. Spärck Jones studied history at Cambridge but moved to philosophy (then called " moral sciences ") in her last year. Her first published conference paper was " The Analogy between Mechanical Translation and Library Retrieval, " 2 a title of great prescience in her career. At that time, it referred to using thesauri to resolve meaning problems in the two technologies, but the link preoccupied her all her life. She returned to this topic in " Information Retrieval and Artificial Intelligence, " 3 arguing that AI in general, and natural language processing in particular , should make more use of information retrieval's statistical methodology. In 1962, after a brief spell of teaching, Spärck Jones accepted Margaret Masterman's invitation to join the Cambridge Language Research Unit (CLRU) and started working toward her doctorate. Under the supervision of Masterman's husband, philosopher Richard Braithwaite, she wrote her thesis, " Synonymy and Semantic Classification. " 4 It was the first application of statistical clustering methods to lexical data—in her case, the whole of Roget's Thesaurus on punched cards—and was an ambitious attempt to create some notion of primitive concepts for machine translation on an empirical basis. Far ahead of its time, 5 the work was not published until 20 years later in the Edinburgh University Information Technology series 4 —at which time Spärck Jones had to be persuaded it was still relevant. This work is the ancestor of a range of empirical semantics research, from the semi-synonymous rows of terms (synsets) in WordNet to much later work on statistical clustering to determine semantic relationships. The historian in Spärck Jones added an extraordinary thesis appendix on artificial languages for coding meaning. She used the Theory of Clumps algorithms, which her husband Roger Needham developed and used in his own thesis work on automatic classification. In 1968, the need for more serious computer facilities took Spärck Jones from the CLRU to the …
منابع مشابه
Automatic summarising: The state of the art
This paper reviews research on automatic summarising in the last decade. This work has grown, stimulated by technology and by evaluation programmes. The paper uses several frameworks to organise the review, for summarising itself, for the factors affecting summarising, for systems,
متن کاملSemantic primitives: the tip of the iceberg
Semantic primitives have been central to Yorick’s approach to language processing. In this peper I review the development of his ideas on the nature and role of primitives, considering them both from the narrower system point of view and in the larger context to which Yorick himself always referred.
متن کاملSteps towards Natural Language to Data Language Translation Using General Semantic Information
متن کامل
In memoriam : Karen
Karen Spärck Jones, one of the pioneers of information retrieval (IR), died on 4 April 2007 at the age of 71. Her career history can be stated simply. She was born in Huddersfield in 1935 and brought up there before going to Cambridge University in 1953 to read history. With the exception of a brief period as a teacher after graduating, she spent her entire life at the University, initially as ...
متن کاملComputational Linguistics: What About the Linguistics?
But this journal content is of interest here for another reason than these scholarly ones. First, references to computing are conspicuous by their absence. Just occasionally, grammar types or semantic models appear that have computational connections, for example in a shared view of feature sets; and there are references to computational corpus analysis, though more often in reviews than in maj...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Intelligent Systems
دوره 22 شماره
صفحات -
تاریخ انتشار 2007